GCC 4.4 Release Series -- Changes, New Features, and Fixes
==========================================================


Caveats
=======
  
- __builtin_stdarg_start has been completely removed from GCC.
  Support for <varargs.h> had been deprecated since GCC 4.0.  Use
  __builtin_va_start as a replacement.

- Some of the errors issued by the C++ front end that could be
  downgraded to warnings in previous releases by using -fpermissive
  are now warnings by default. They can be converted into errors by
  using -pedantic-errors.

- Use of the cpp assertion extension will now emit a warning when
  -Wdeprecated or -pedantic is used.  This extension has been
  deprecated for many years, but never warned about.

- Packed bit-fields of type char were not properly bit-packed on many
  targets prior to GCC 4.4.  On these targets, the fix in GCC 4.4
  causes an ABI change.  For example there is no longer a 4-bit
  padding between field a and b in this structure:
    
    struct foo
    {
      char a:4;
      char b:8;
    } __attribute__ ((packed));

  There is a new warning to help identify fields that are affected:
    
    foo.c:5: note: Offset of packed bit-field 'b' has changed in GCC 4.4

  The warning can be disabled with -Wno-packed-bitfield-compat.

- On ARM EABI targets, the C++ mangling of the va_list type has been
  changed to conform to the current revision of the EABI.  This does
  not affect the libstdc++ library included with GCC.

- The SCOUNT and POS bits of the MIPS DSP control register are now
  treated as global. Previous versions of GCC treated these fields as
  call-clobbered instead.

- The MIPS port no longer recognizes the h asm constraint.  It was
  necessary to remove this constraint in order to avoid generating
  unpredictable code sequences.

  One of the main uses of the h constraint was to extract the high
  part of a multiplication on 64-bit targets.  For example:
    
    asm ("dmultu\t%1,%2" : "=h" (result) : "r" (x), "r" (y));

  You can now achieve the same effect using 128-bit types:
    
    typedef unsigned int uint128_t __attribute__((mode(TI)));
    result = ((uint128_t) x * y) >> 64;

- The second sequence is better in many ways.  For example, if x and y
  are constants, the compiler can perform the multiplication at
  compile time.  If x and y are not constants, the compiler can
  schedule the runtime multiplication better than it can schedule an
  asm statement.
    
- Support for a number of older systems and recently unmaintained or
  untested target ports of GCC has been declared obsolete in GCC 4.4.
  Unless there is activity to revive them, the next release of GCC
  will have their sources permanently removed.

- The following ports for individual systems on particular
  architectures have been obsoleted:
    
  - Generic a.out on IA32 and m68k (i[34567]86-*-aout*, m68k-*-aout*)
  - Generic COFF on ARM, H8300, IA32, m68k and SH (arm-*-coff*,
    armel-*-coff*, h8300-*-*, i[34567]86-*-coff*, m68k-*-coff*,
    sh-*-*).  This does not affect other more specific targets using
    the COFF object format on those architectures, or the more
    specific H8300 and SH targets (h8300-*-rtems*, h8300-*-elf*,
    sh-*-elf*, sh-*-symbianelf*, sh-*-linux*, sh-*-netbsdelf*,
    sh-*-rtems*, sh-wrs-vxworks).
  - 2BSD on PDP-11 (pdp11-*-bsd)
  - AIX 4.1 and 4.2 on PowerPC (rs6000-ibm-aix4.[12]*,
    powerpc-ibm-aix4.[12]*)
  - Tuning support for Itanium1 (Merced) variants.  Note that code
    tuned for Itanium2 should also run correctly on Itanium1.
    
- The protoize and unprotoize utilities have been obsoleted and will
  be removed in GCC 4.5.  These utilities have not been installed by
  default since GCC 3.0.

- Support has been removed for all the configurations obsoleted in
  GCC 4.3.

- Unknown -Wno-* options are now silently ignored by GCC if no other
  diagnostics are issued. If other diagnostics are issued, then GCC
  warns about the unknown options.
   
- More information on porting to GCC 4.4 from previous versions of GCC
  can be found in the porting guide for this release.
 

General Optimizer Improvements
==============================

- A new command-line switch -findirect-inlining has been added.  When
  turned on it allows the inliner to also inline indirect calls that
  are discovered to have known targets at compile time thanks to
  previous inlining.

- A new command-line switch -ftree-switch-conversion has been added.
  This new pass turns simple initializations of scalar variables in
  switch statements into initializations from a static array, given
  that all the values are known at compile time and the ratio between
  the new array size and the original switch branches does not exceed
  the parameter --param switch-conversion-max-branch-ratio (default is
  eight).

- A new command-line switch -ftree-builtin-call-dce has been added.
  This optimization eliminates unnecessary calls to certain builtin
  functions when the return value is not used, in cases where the
  calls can not be eliminated entirely because the function may set
  errno.  This optimization is on by default at -O2 and above.

- A new command-line switch -fconserve-stack directs the compiler to
  minimize stack usage even if it makes the generated code slower.
  This affects inlining decisions.

- When the assembler supports it, the compiler will now emit unwind
  information using assembler .cfi directives.  This makes it possible
  to use such directives in inline assembler code.  The new option
  -fno-dwarf2-cfi-asm directs the compiler to not use .cfi directives.

- The Graphite branch has been merged.  This merge has brought in a
  new framework for loop optimizations based on a polyhedral
  intermediate representation.  These optimizations apply to all the
  languages supported by GCC.  The following new code transformations
  are available in GCC 4.4:
      
  - -floop-interchange performs loop interchange transformations on
    loops.  Interchanging two nested loops switches the inner and
    outer loops.  For example, given a loop like:
	  
      DO J = 1, M
        DO I = 1, N
          A(J, I) = A(J, I) * C
        ENDDO
      ENDDO
  
    loop interchange will transform the loop as if the user had written:
      
      DO I = 1, N
        DO J = 1, M
          A(J, I) = A(J, I) * C
        ENDDO
      ENDDO
	      
    which can be beneficial when N is larger than the caches, because
    in Fortran, the elements of an array are stored in memory
    contiguously by column, and the original loop iterates over rows,
    potentially creating at each access a cache miss.
	
  - -floop-strip-mine performs loop strip mining transformations on
    loops.  Strip mining splits a loop into two nested loops.  The
    outer loop has strides equal to the strip size and the inner loop
    has strides of the original loop within a strip.  For example,
    given a loop like:

      DO I = 1, N
        A(I) = A(I) + C
      ENDDO

    loop strip mining will transform the loop as if the user had written:
	    
      DO II = 1, N, 4
        DO I = II, min (II + 3, N)
          A(I) = A(I) + C
        ENDDO
      ENDDO

  - -floop-block performs loop blocking transformations on loops.
    Blocking strip mines each loop in the loop nest such that the
    memory accesses of the element loops fit inside caches.  For
    example, given a loop like:

      DO I = 1, N
        DO J = 1, M
          A(J, I) = B(I) + C(J)
        ENDDO
      ENDDO
	  
    loop blocking will transform the loop as if the user had written:
	  
      DO II = 1, N, 64
        DO JJ = 1, M, 64
          DO I = II, min (II + 63, N)
            DO J = JJ, min (JJ + 63, M)
              A(J, I) = B(I) + C(J)
            ENDDO
          ENDDO
        ENDDO
      ENDDO
	  
    which can be beneficial when M is larger than the caches, because
    the innermost loop will iterate over a smaller amount of data that
    can be kept in the caches.

- A new register allocator has replaced the old one.  It is called
  integrated register allocator (IRA) because coalescing, register
  live range splitting, and hard register preferencing are done
  on-the-fly during coloring.  It also has better integration with the
  reload pass.  IRA is a regional register allocator which uses modern
  Chaitin-Briggs coloring instead of Chow's priority coloring used in
  the old register allocator.  More info about IRA internals and
  options can be found in the GCC manuals.
    
- A new instruction scheduler and software pipeliner, based on the
  selective scheduling approach, has been added.  The new pass
  performs instruction unification, register renaming, substitution
  through register copies, and speculation during scheduling.  The
  software pipeliner is able to pipeline non-countable loops.  The new
  pass is targeted at scheduling-eager in-order platforms.  In GCC 4.4
  it is available for the Intel Itanium platform working by default as
  the second scheduling pass (after register allocation) at the -O3
  optimization level.

- When using -fprofile-generate with a multi-threaded program, the
  profile counts may be slightly wrong due to race conditions.  The
  new -fprofile-correction option directs the compiler to apply
  heuristics to smooth out the inconsistencies.  By default the
  compiler will give an error message when it finds an inconsistent
  profile.

- The new -fprofile-dir=PATH option permits setting the directory
  where profile data files are stored when using -fprofile-generate
  and friends, and the directory used when reading profile data files
  using -fprofile-use and friends.


New warning options
===================

- The new -Wframe-larger-than=NUMBER option directs GCC to emit a
  warning if any stack frame is larger than NUMBER bytes.  This may be
  used to help ensure that code fits within a limited amount of stack
  space.

- The new -Wno-mudflap option disables warnings about constructs which
  can not be instrumented when using -fmudflap.


New Languages and Language specific improvements
================================================

- Version 3.0 of the OpenMP specification is now supported for the C,
  C++, and Fortran compilers.


C family
--------

- A new optimize attribute was added to allow programmers to change
  the optimization level and particular optimization options for an
  individual function.  You can also change the optimization options
  via the GCC optimize pragma for functions defined after the pragma.
  The GCC push_options pragma and the GCC pop_options pragma allow you
  temporarily save and restore the options used.  The GCC
  reset_options pragma restores the options to what was specified on
  the command line.

- Uninitialized warnings do not require enabling optimization anymore,
  that is, -Wuninitialized can be used together with -O0.
  Nonetheless, the warnings given by -Wuninitialized will probably be
  more accurate if optimization is enabled.

- -Wparentheses now warns about expressions such as (!x | y) and (!x & y).
  Using explicit parentheses, such as in ((!x) | y), silences this warning.

- -Wsequence-points now warns within if, while,do while and for
  conditions, and within for begin/end expressions.

- A new option -dU is available to dump definitions of preprocessor
  macros that are tested or expanded.


C++
---
  
- Improved experimental support for the upcoming ISO C++ standard,
  C++0x. Including support for auto, inline namespaces, generalized
  initializer lists, defaulted and deleted functions, new character
  types, and scoped enums.

- Those errors that may be downgraded to warnings to build legacy code
  now mention -fpermissive when -fdiagnostics-show-option is enabled.

- -Wconversion now warns if the result of a static_cast to enumeral
  type is unspecified because the value is outside the range of the
  enumeral type.

- -Wuninitialized now warns if a non-static reference or non-static
  const member appears in a class without constructors.

- G++ now properly implements value-initialization, so objects with an
  initializer of () and an implicitly defined default constructor will
  be zero-initialized before the default constructor is called.


Runtime Library (libstdc++)
---------------------------

- Improved experimental support for the upcoming ISO C++ standard,
  C++0x, including:
      
  - Support for <chrono>, <condition_variable>, <cstdatomic>,
    <forward_list>, <initializer_list>, <mutex>, <ratio>,
    <system_error>, and <thread>.

  - unique_ptr, <algorithm> additions, exception propagation, and
    support for the new character types in <string> and <limits>.

  - Existing facilities now exploit initializer lists, defaulted and
    deleted functions, and the newly implemented core C++0x features.

  - The standard containers are more efficient together with stateful
    allocators.

- Experimental support for non-standard pointer types in containers.
  The long standing libstdc++/30928 has been fixed for targets running
  glibc 2.10 or later.

- As usual, many small and larger bug fixes, in particular quite a few
  corner cases in <locale>.


Fortran
-------
  
- GNU Fortran now employs libcpp directly instead of using cc1 as an
  external preprocessor. The -cpp option was added to allow manual
  invocation of the preprocessor without relying on filename
  extensions.

- The -Warray-temporaries option warns about array temporaries
  generated by the compiler, as an aid to optimization.

- The -fcheck-array-temporaries option has been added, printing a
  notification at run time, when an array temporary had to be created
  for an function argument. Contrary to -Warray-temporaries the
  warning is only printed if the array is noncontiguous.

- Improved generation of DWARF debugging symbols

- If using an intrinsic not part of the selected standard (via -std=
  and -fall-intrinsics) gfortran will now treat it as if this
  procedure were declared EXTERNAL and try to link to a user-supplied
  procedure. -Wintrinsics-std will warn whenever this happens. The
  now-useless option -Wnonstd-intrinsic was removed.

- The flag -falign-commons has been added to control the alignment of
  variables in COMMON blocks, which is enabled by default in line with
  previous GCC version. Using -fno-align-commons one can force commons
  to be contiguous in memory as required by the Fortran standard,
  however, this slows down the memory access. The option
  -Walign-commons, which is enabled by default, warns when padding
  bytes were added for alignment. The proper solution is to sort the
  common objects by decreasing storage size, which avoids the
  alignment problems.

- Fortran 2003 support has been extended:

  - Wide characters (ISO 10646, UCS-4, kind=4) and UTF-8 I/O is now
    supported (except internal reads from/writes to wide strings).
    -fbackslash now supports also \unnnn and \Unnnnnnnn to enter
    Unicode characters.

  - Asynchronous I/O (implemented as synchronous I/O) and the
    decimal=, size=, sign=, pad=, blank=, and delim= specifiers are
    now supported in I/O statements.

  - Support for Fortran 2003 structure constructors and for array
    constructor with typespec has been added.

  - Procedure Pointers (but not yet as component in derived types and
    as function results) are now supported.

  - Abstract types, type extension, and type-bound procedures (both
    PROCEDURE and GENERIC but not as operators). Note: As
    CLASS/polymorphyic types are not implemented, type-bound
    procedures with PASS accept as non-standard extension TYPE
    arguments.
      
- Fortran 2008 support has been added:
      
  - The -std=f2008 option and support for the file extensions .f2008
    and .F2008 has been added.

  - The g0 format descriptor is now supported.

  - The Fortran 2008 mathematical intrinsics ASINH, ACOSH, ATANH, ERF,
    ERFC, GAMMA, LOG_GAMMA, BESSEL_*, HYPOT, and ERFC_SCALED are now
    available (some of them existed as GNU extension before). Note:
    The hyperbolic functions are not yet supporting complex arguments
    and the three- argument version of BESSEL_*N is not available.

  - The bit intrinsics LEADZ and TRAILZ have been added.


Java (GCJ)
----------


Ada
---

- The Ada runtime now supports multilibs on many platforms including
  x86_64, SPARC and PowerPC. Their build is enabled by default.


New Targets and Target Specific Improvements
============================================

ARM
---
  
- GCC now supports optimizing for the Cortex-A9, Cortex-R4 and
  Cortex-R4F processors and has many other improvements to
  optimization for ARM processors.

- GCC now supports the VFPv3 variant with 16 double-precision
  registers with -mfpu=vfpv3-d16.  The option -mfpu=vfp3 has been
  renamed to -mfpu=vfpv3.

- GCC now supports the -mfix-cortex-m3-ldrd option to work around an
  erratum on Cortex-M3 processors.

- GCC now supports the __sync_* atomic operations for ARM EABI GNU/Linux.

- The section anchors optimization is now enabled by default when
  optimizing for ARM.

- GCC now uses a new EABI-compatible profiling interface for EABI
  targets.  This requires a function __gnu_mcount_nc, which is
  provided by GNU libc versions 2.8 and later.
  

AVR
---
  
- The -mno-tablejump option has been deprecated because it has the
  same effect as the -fno-jump-tables option.

- Added support for these new AVR devices:
      
  - ATA6289
  - ATtiny13A
  - ATtiny87
  - ATtiny167
  - ATtiny327
  - ATmega8C1
  - ATmega16C1
  - ATmega32C1
  - ATmega8M1
  - ATmega16M1
  - ATmega32M1
  - ATmega32U4
  - ATmega16HVB
  - ATmega4HVD
  - ATmega8HVD
  - ATmega64C1
  - ATmega64M1
  - ATmega16U4
  - ATmega32U6
  - ATmega128RFA1
  - AT90PWM81
  - AT90SCR100
  - M3000F
  - M3000S
  - M3001B
  

IA-32/x86-64
------------
  
- Support for Intel AES built-in functions and code generation is
  available via -maes.

- Support for Intel PCLMUL built-in function and code generation is
  available via -mpclmul.

- Support for Intel AVX built-in functions and code generation is
  available via -mavx.

- Automatically align the stack for local variables with alignment
  requirement.

- GCC can now utilize the SVML library for vectorizing calls to a set
  of C99 functions if -mveclibabi=svml is specified and you link to an
  SVML ABI compatible library.

- A new target attribute was added to allow programmers to change the
  target options like -msse2 or -march=k8 for an individual function.
  You can also change the target options via the GCC target pragma for
  functions defined after the pragma.

- GCC can now be configured with options --with-arch-32,
  --with-arch-64, --with-cpu-32, --with-cpu-64, --with-tune-32 and
  --with-tune-64 to control the default optimization separately for
  32-bit and 64-bit modes.
  

IA-32/IA64
----------
  
- Support for __float128 (TFmode) IEEE quad type and corresponding
  TCmode IEEE complex quad type is available via the soft-fp library
  on IA-32/IA64 targets. This includes basic arithmetic operations
  (addition, subtraction, negation, multiplication and division) on
  __float128 real and TCmode complex values, the full set of IEEE
  comparisons between __float128 values, conversions to and from
  float, double and long double floating point types, as well as
  conversions to and from signed or unsigned integer, signed or
  unsigned long integer and signed or unsigned quad (TImode, IA64
  only) integer types.  Additionally, all operations generate the full
  set of IEEE exceptions and support the full set of IEEE rounding
  modes.
  

M68K/ColdFire
-------------
  
- GCC now supports instruction scheduling for ColdFire V1, V3 and V4
  processors.  (Scheduling support for ColdFire V2 processors was
  added in GCC 4.3.)  GCC now supports the -mxgot option to support
  programs requiring many GOT entries on ColdFire.  The
  m68k-*-linux-gnu target now builds multilibs by default.
  

MIPS
----

- MIPS Technologies have extended the original MIPS SVR4 ABI to
  include support for procedure linkage tables (PLTs) and copy
  relocations.  These extensions allow GNU/Linux executables to use a
  significantly more efficient code model than the one defined by the
  original ABI.

  GCC support for this code model is available via a new command-line
  option, -mplt.  There is also a new configure-time option,
  --with-mips-plt, to make -mplt the default.

  The new code model requires support from the assembler, the linker,
  and the runtime C library.  This support is available in binutils
  2.19 and GLIBC 2.9.

- GCC can now generate MIPS16 code for 32-bit GNU/Linux executables
  and 32-bit GNU/Linux shared libraries.  This feature requires GNU
  binutils 2.19 or above.

- Support for RMI's XLR processor is now available through the
  -march=xlr and -mtune=xlr options.

- 64-bit targets can now perform 128-bit multiplications inline,
  instead of relying on a libgcc function.

- Native GNU/Linux toolchains now support -march=native and
  -mtune=native, which select the host processor.

- GCC now supports the R10K, R12K, R14K and R16K processors.  The
  canonical -march= and -mtune= names for these processors are r10000,
  r12000, r14000 and r16000 respectively.

- GCC can now work around the side effects of speculative execution on
  R10K processors.  Please see the documentation of the
  -mr10k-cache-barrier option for details.

- Support for the MIPS64 Release 2 instruction set has been added.
  The option -march=mips64r2 enables generation of these instructions.

- GCC now supports Cavium Networks' Octeon processor.  This support is
  available through the -march=octeon and -mtune=octeon options.

- GCC now supports STMicroelectronics' Loongson 2E/2F processors.  The
  canonical -march= and -mtune= names for these processors are
  loongson2e and loongson2f.
  

picochip
--------

Picochip is a 16-bit processor.  A typical picoChip contains over 250
small cores, each with small amounts of memory. There are three
processor variants (STAN, MEM and CTRL) with different instruction
sets and memory configurations and they can be chosen using the -mae
option.
  
This port is intended to be a "C" only port.


Power Architecture and PowerPC
------------------------------
  
- GCC now supports the e300c2, e300c3 and e500mc processors.

- GCC now supports Xilinx processors with a single-precision FPU.

- Decimal floating point is now supported for e500 processors.
  

S/390, zSeries and System z9/z10
--------------------------------
  
- Support for the IBM System z10 EC/BC processor has been added.  When
  using the -march=z10 option, the compiler will generate code making
  use of instructions provided by the General-Instruction-Extension
  Facility and the Execute-Extension Facility.
  

VxWorks
-------
  
- GCC now supports the thread-local storage mechanism used on VxWorks.
  

Xtensa
------
  
- GCC now supports thread-local storage (TLS) for Xtensa processor
  configurations that include the Thread Pointer option.  TLS also
  requires support from the assembler and linker; this support is
  provided in the GNU binutils beginning with version 2.19.
  

Documentation improvements
==========================

Other significant improvements
==============================


------------------------------------------------------------------------------
Please send FSF & GNU inquiries & questions to gnu@gnu.org. There are
also other ways to contact the FSF.

These pages are maintained by the GCC team.

For questions related to the use of GCC, please consult these web
pages and the GCC manuals. If that fails, the gcc-help@gcc.gnu.org
mailing list might help.  Please send comments on these web pages and
the development of GCC to our developer mailing list at gcc@gnu.org or
gcc@gcc.gnu.org.  All of our lists have public archives.

Copyright (C) Free Software Foundation, Inc.,
51 Franklin St, Fifth Floor, Boston, MA 02110, USA.

Verbatim copying and distribution of this entire article is
permitted in any medium, provided this notice is preserved.

Last modified 2009-04-21
